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Abstract 

Background: Yellow lupin {Lupinus luteus L.) is a minor legunne crop characterized by its high seed protein content. 
Although grown in several tennperate countries, its orphan condition has linnited the generation of genonnic tools 
to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 
454-expresed sequence tag (EST) libraries, carried out comparative studies between L luteus and model legume 
species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on 
diversity studies and transferability to related species. 

Results: Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for LI (young leaves, buds 
and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an 
average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated 
into proteins and 8,741 of them were full length. Around 57% of L luteus sequences had significant similarity with 
at least one sequence of Medicago, Lotus, Arobidopsis, or Glycine, and 40.17% showed positive matches with all of 
these species. L luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs 
contained at least one EST-SSR, with a frequency of one SSR per 17.75 kbp. Empirical evaluation of the EST-SSR 
candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 1 13 (30%) SSR 
primer pairs were able to amplify fragments from L hisponicus and L mutobilis DNA, respectively. Fifty polymorphic 
EST-SSRs were used to genotype a sample of 64L luteus accessions. Neighbor-joining distance analysis detected 
the existence of several clusters among L luteus accessions, strongly suggesting the existence of population 
subdivisions. However, no clear clustering patterns followed the accession's origin. 

Conclusion: L luteus deep transcriptome sequencing will facilitate the further development of genomic tools and 
lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, 
identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for 
genome comparisons and putative gene candidates for QTL detection. 
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Background 

L luteus is a member of the genistoid clade of the Faba- 
ceae family (2n = 52), which is the third largest flowering 
plant family with over 700 genera and 20,000 species [1]. 
The genus Lupinus comprises more than 200 annual 
and perennial herbaceous species of which several are 
cultivated and used as human food or animal feed [2]. 
Some of them show high levels of tolerance to biotic 
and abiotic stresses. For instance, L, hispanicus, a wild 
relative of L, luteus, has high tolerance to diseases and 
good adaptation to poor soils, but high levels of bitter 
alkaloids and low agronomic yields [3]. Lupins are con- 
sidered to be of polyploid origin which probably played 
a crucial role in the evolution of their ancestral genomes 
[4,5]. The major cultivated species are the old world 
lupin L, albus (white lupin), L, angustifolius (narrow- 
leafed lupin), L, luteus (yellow lupin), and the new world 
species L, mutabilis (pearl lupin or tarwii) [6]. 

L, luteus is widely distributed across the Mediterranean 
region, has shallow soil requirements, and cultivated 
accessions have variable seed yields in Mediterranean 
environments [7]. In addition, yellow lupin seeds have 
the highest protein content and twice the cysteine and 
methionine content of most lupins [8,9]. However, des- 
pite its highly nutritional qualities, there is a lack of gen- 
etic and molecular tools to aid the genetic breeding of 
this species. 

EST sequencing has accelerated gene discovery when 
genome sequences are not available, facilitating gene 
family identification and development of molecular mar- 
kers. Next-generation sequencing has generated enor- 
mous amount of expressed sequence data for a wide 
number of plant species, specially minor or orphan 
crops [10]. For example, EST and genome sequencing of 
lentil and chickpea would not have been feasible without 
next-generation sequencing [11,12]. The lower cost and 
greater sequence yield has allowed the identification of 
candidate genes, even when they are expressed at low 
levels [13,14]. 

Research on plants, animals and fungi has shown 
that sequences of expressed genes are often widely 
transferable among species, and even genera, allowing 
wide genome comparative mapping studies [15,16]. For 
instance, the combination of orphan crop EST sequences 
with model plant genetic and genomic resources, such 
as Lotus japonicus (Japanese trefoil) and Medicago 
truncatula (barrel medic), has identified macro- and 
micro-scale synteny, discovered new genes and alleles, 
and provided insights into genome evolution and du- 
plication [17,18]. Comparisons between ESTs and gene 
sequences among several legume species have allowed 
comparative genome studies between L albus and 
M, truncatula [19], and L angustifolius and Lotus 
japonicus [20]. 



Several molecular markers have been developed for 
Lupinus species, including RFLPs, ITAPs (Intron tar- 
geted amplified polymorphic sequences), and AFLPs, 
which have been used to build genetic linkage maps in 
L, albus [19] and L, angustifolius [20,21]. So far, a limited 
number of SSRs have been developed for Lupinus spe- 
cies, and very few of these are EST-SSRs Le, SSRs that 
are found in expressed sequences [21-23]. Genomic and 
EST-SSRs have been widely used for the improvement of 
major crop plants, but their initial development with 
traditional methods requires significant research invest- 
ment. Now, an almost unlimited number of genomic 
and EST-SSRs can be readily developed from next- 
generation sequencing approaches within most crop spe- 
cies, including orphan crops such as lupin [24-28]. The 
expressed nature of EST-SSRs allows the annotation of 
these markers with putative functions by sequence hom- 
ology and potentially reduces the genetic distance be- 
tween marker and causal gene to 0 cM. [29,30]. For 
instance, the length of a dinucleotide SSR at the 5' UTR 
of a waxy gene has been associated with amylase content 
in rice [31,32]. EST-SSRs have also been associated with 
several disease resistant genes in wheat and rice [33,34] 
and a number of agronomically important traits in cot- 
ton, maize and narrow-leafed lupin [35-37]. 

In this study, we constructed 454-EST libraries, carried 
out comparative studies between L, luteus and model 
legume species, and mapped L, luteus expressed 
sequences on the M, truncatula chromosomes. Align- 
ments between our putative L, luteus genes and their 
homologs in M, truncatula, coupled with amplifications 
of intergenic regions provided evidence of microscale 
synteny between both species. In addition, we developed 
EST-SSR markers and illustrated their utility within di- 
verse accessions of yellow lupin. Finally, because these 
EST-SSR markers are gene-based, they are also likely 
conserved among different species of lupin. We eval- 
uated EST-SSR utility in the other Lupinus species, 
L. mutabilis and L. hispanicus. 

Methods 

Library construction and 454 sequencing 

cDNA libraries were constructed from mRNA isolated 
from two tissue pools. Pool 1 (LI) included young leaves, 
buds and flowers, and pool 2 (L2), seeds in different de- 
velopmental stages. RNA from pool 1 and 2 was isolated 
separately according to the guanidine hydrochloride 
method [38]. Both RNAs were assessed for quality by 
inspecting rRNA bands on an Agilent Bioanalyzer (Agi- 
lent Technologies, CA, USA). 

cDNAs libraries were normalized and prepared using 
procedures for Roche 454 Titanium sequencing (Roche, 
Branford, CT, USA). cDNAs from LI and L2 were 
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synthesized using the stratagene AccuScript High Fidelity 
RT-PCR System (Agilent Technologies, CA, USA) and 5' 
specific adaptors from Clontech. A cDNA normalization 
was used to improve coding sequence coverage, avoid AT 
homopolymer artifacts, and reduce excessive 3' end tran- 
script sequence [39]. cDNAs from both libraries were 
amplified using the Clontech Advantage HF system 
(Clontech Laboratories, Inc) and normalized utilizing the 
Evrogen Trimmer cDNA Normalization kit (Axxora, 
LLC). These un-cloned, normalized cDNA libraries were 
prepared for pyrosequencing according to the manufac- 
turers specifications. One 454 run of sequencing was per- 
formed for each EST library (454 Life Sciences, Roche). 

Separate transcriptome assemblies of LI and L2 librar- 
ies were created using Newbler {de novo sequence as- 
sembly software of Roche 454 Life Sciences) and the 
cDNA option. A third assembly (L1L2) was completed 
using the reads from both libraries to avoid sequence re- 
dundancy when developing SSR markers. Reads were 
initially assembled into contigs and contigs into isotigs, 
which are equivalent to splice transcriptional variants. 
Sequence read EST data for LI and L2 are available 
through the Sequence Read Archive (SRA055806). 

EST annotation, function and comparative genomics to 
other species 

Comparing isotigs from the combined assembly (L1L2) 
to the curated non-redundant protein database (nr, 
www.ncbi.nlm.nih.gov; blastx, e value < le"^^) provided a 
functional annotation for each isotig. Alignments of 
translated-isotigs and proteins with an e-value < le'^^ 
were considered to have significant homology. Annota- 
tions of the aligned proteins were extrapolated to anno- 
tate our putative isotig sequence using Blast2GO (www. 
blast2go.org). To directly compare the lupin isotigs to 
the genes of other crops, blast searches were also used 
to compare isotig translations to Arabidopsis thaliana, 
Glycine max, Medicago truncatula and Lotus japonicus 
Gene Indices (tblastx, e-value < le'^^). Isotigs were also 
annotated using Gene Ontology (GO) annotations from 
InterProScan (www.ebi.ac.uk). 

In silico lupin EST mapping and microsynteny 

Blast was used to compare lupin EST isotigs to the Med- 
icago genome 3.0 release (< le"^^, HSP identity 60% and 
HSP length > 50 bp.) The Blast results were visualized 
using GBrowse where positive matches were displayed 
as featured tracks on GBrowse 2.13 [40]. The presence 
of microsynteny was evaluated by PCR amplification of 
putatively conserved chromosome blocks between L 
luteus and M. truncatula. Where alignments between 
yellow lupin and M. truncatula were identified, specific 
primer pairs were designed to amplify intergenic regions 
(Additional file 1). These targeted, intergenic regions 



were PCR amplified from two L. luteus and one L. hispa- 
nicus accessions using 100 ng of genomic DNA in 20 ul 
reactions containing 100 ng of genomic DNA, 0.2 mM 
dNTPs, 2 mM MgCl2, IX PCR buffer, 2.5% DMSO, 1 U 
taq polymerase (Agilent Technologies, Santa Clara, CA) 
and 5 pmoles of each forward-reverse primer pair. PCR 
reactions were carried out following a touchdown proto- 
col on a peltier thermalcycler (MJ Research, Inc.) 94°C 
for 5 min; 5 cycles of 1 min at 94°C, 1 min at 55-65°C 
decreasing 1°C per cycle, 2 min at 72°C followed by 
35 cycles of 1 min at 94°C, 1 min at 50-60°C and 2 min 
at 72°C. Amplicons were purified from agarose gels and 
sequenced. These amplified, intergenic sequences were 
mapped onto the M, truncatula genome and visualized 
within a local implementation of GBrowse (Additional 
file 1). Positive PCR microsynteny set of primers were 
additionally tested against a screening panel consisting 
of six diverse accessions of L, luteus to search for poly- 
morphisms among yellow lupin genotypes (Additional 
file 2). 

Identification of EST-SSRs 

SSR containing lupin isotigs were identified using the 
software MISA (MIcroSAtellite, http://www.pgrc.ipk- 
gatersleben.de/misa). SSR search criteria changed accord- 
ing to repeat types. Di-, and tri-repeats were selected with 
a minimum length of 12 and 15 nucleotides, respectively. 
For tetra-, penta- and hexa-repeats, the minimum length 
was 20 nucleotides. Mononucleotide repeats were not 
considered due to the possibility of 454 homopolymer se- 
quencing errors associated with this technology. To esti- 
mate the amount of SSRs included in coding regions, 
L1L2 sequences were analyzed using ESTScan (http:// 
www.ch.embnet.org/software/ESTScan.html). ORFs dis- 
covery was carried out using default parameters and puta- 
tive cd sequences scanned for SSR motifs using MISA. 

From all selected-SSR containing isotigs, only 
sequences with a motif of at least 7 repeat units were 
considered for primer design. Flanking primer pairs were 
designed using the Primer3 software available at NCBI 
v.3.12 with expected amplicon lengths between 150 - 
500 bp. Oligonucleotides were synthesized by IDT (Inte- 
grated DNA Technologies, Inc.). 

Evaluation and utility of EST-SSRs 

EST-SSR polymorphisms and transferability were evalu- 
ated on the germplasm screening panel previously 
mentioned, and one accession each of L, hispanicus 
and L, mutahilis, 

DNAs were extracted following standard procedures 
[41], quantified using a synergy HT Multimode Micro- 
plate Reader (Biotek Instruments, Winooski, VT), and 
diluted to 50 ng/ul in TE buffer (10 Mm TRIS, 1 mM 
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EDTA pH 7.5). DNA amplification was carried out in 
20ul PGR reactions as described above. 

PGR products were separated on 6% denaturing poly- 
acrylamide gels, run in TBE buffer at 60 watts for 3- 
4 hours and visualized using silver stain procedures. 
DNA amplicons of six EST-SSR primer-pairs used in the 
polymorphism screening were purified from agarose gels 
and sequenced in an Applied Biosystems 3730x1 DNA 
Analyzer sequencer (Applied Biosystems, Garlsbad, GA). 
Amplicon sequences from each EST-SSR primer-pairs 
were aligned using Geneious version 5.5.3.0 (Biomatters 
Ltd., using default parameters). 

Genetic diversity 

The polymorphic EST-SSRs were evaluated in sixty-four 
L, luteus accessions from several origins (Poland, 
Ukraine, the former Soviet Union, Spain, Germany, Mo- 
rocco, Belarus, Portugal, Netherlands, Israel, Hungary, 
and Ghile; Additional file 2). Polish accessions were 
kindly provided by W.K. Swiecicki, Institute of Plant 
Genetics, Polish Academy of Sciences, Poznan. Our col- 
lection of Ghilean accessions is composed of improved 
breeding lines that are adapted to the Ghilean environ- 
ment. This Ghilean germplasm originated from breeding 
and selection of old European varieties for Southern 
Ghilean environmental conditions. The rest were 
obtained from the western Regional PI Station, USDA, 
ARS, WRPIS, Washington State University, Regional 
Plant Introduction Station, Pullman, Washington, USA. 
A sample of 50 polymorphic EST-SSRs was used to 
genotype the sixty-four L, luteus accessions (Table 1). 
Eighteen EST-SSRs were identified from isotigs specific 
to L2, 25 isotigs specific to LI, and seven were common 
to both LI and L2 libraries. EST-SSR fragments with dif- 
ferent sizes were scored as different alleles and coded 
with alphabetical letters for each primer set. Genetic 
relationships among L, luteus accessions were evaluated 
using the neighbor- joining algorithm implemented in 
PAUP* (v4.01bl0). A distance tree was built and branch 
support estimated by 10,000 bootstraps. 

Results 

Seed and leaf-flower EST libraries 

Two runs of 454 pyrosequencing yielded 205 Mb and 
530 Mb of sequence data for LI and L2 EST libraries, 
respectively (Table 2). LI produced 604,869 usable reads 
that assembled into 26,975 contigs with an average 
length of 468 nucleotides. L2 generated 1,345,892 usable 
reads that assembled into 43,674 contigs with an average 
length of 800 nucleotides. Gareful inspection of the LI 
contigs found lower percentages of coding regions, 
higher A/T content, and 2x more A/T homopolymers 
than L2 contigs. A combined assembly (L1L2) was cre- 
ated to identify the genes that were common in both 



tissues. 1,964,517 reads were used in the L1L2 assembly 
and they formed 71,655 contigs with an average contig 
length of 632 nucleotides. To reduce sequence redun- 
dancy due to transcript and alternative splice variants, 
L1L2 contigs were clustered into 55,309 isotigs, of which 
38,200 isotigs translated into proteins and 8,741 of them 
were full length. 

Functional classification and in silico comparative 
genomics 

The assembled 454 isotigs represented putative tran- 
scriptional products i.e. functional genes. Blastx was 
used to annotate the L1L2 putative genes {i.e. isotigs). A 
total of 32,862 (59.5%) putative genes showed matches 
with other species (<le'^°). Of these sequences, 20,169 
(36.5%) showed high similarity to other plant species 
genes (<le'^°). GO annotations were grouped under 
three categories: molecular function, biological pro- 
cesses, and cellular components (Figure 1). At least 
31,142 isotigs were annotated with one molecular func- 
tion, 11,894 with a cellular component and 22,842 with 
biological process. 

Blast was used to compare L1L2 to several model spe- 
cies (tblastx; < le'^^; Figure 2). Around 57% (31,520) of 
L. luteus sequences had significant similarity with at 
least one sequence of Medicago, Lotus, Arabidopsis, or 
Glycine^ and 40.17% showed positive matches with all of 
these species. 

In silico mapping of lupin ESTs on M. Truncatula 
chromosomes 

Alignment of L. luteus isotig sequences to the M. trun- 
catula genome (Blastn; <le'^°; MT3) was used to iden- 
tify local genomic variability between our ESTs and a 
related, well-annotated reference genome sequence. The 
alignments were visualized using GBrowse (v. 2.13) with 
the Blast matches displayed as feature tracks. A total of 
25,400 sequences (46%) from L1L2 had a positive match 
with MT3 and were distributed heterogeneously on the 
M. truncatula chromosomes. Ghromosomes 3 and 1 had 
the highest (34,636) and lowest (16,055) number of 
matches, respectively. Each L. luteus sequence was 
mapped to an average of 3.7 positions on the Medicago 
genome. 

Occasionally, independent alignments of lupin genes 
with the M. truncatula genome were found relatively 
close to each other that primers could be designed to 
hybridize conserved exons, allowing the amplification of 
intergenic sequences in between lupin and M. trunca- 
tula coding sequences (Figure 3). Positive PGR amplifi- 
cation of intergenic regions using L. luteus genomic 
DNA and primers anchored on conserved exonic 
regions of adjacent M. truncatula genes suggested the 
occurrence of microsynteny {i.e. conserved gene order) 



Table 1 Characteristics of 50 EST-SSR primers developed in L luteus. Shown for each primer pair are the library specificity, repeat motif, forward and reverse 
sequence, allele range size (bp), number of alleles, amplification in other Lupin species, and annotation 



Marker name 


Library Repeat motif 


Forward primer (5'-3') 


Reverse primer (5'-3') 


Size (bp) No of alleles 


Amplification 


Annotation 


Ill2itg33000 


LI 


(ACA)7 


CACGTCAGTCOTGCACCTA 


GCACAGCAACAACAACACAA 


129-132 


2 


L.hisponicus 




Ill2itg51784 


LI 


ms 


CATCOTCAAAAACCAmCAA 


AATGTOATGAACGCGTGTG 


274-280 


3 






Ill2itg52347 


LI 


(AT)8 


CTCATGmOTGGGTGGAAA 


CAATCATGTCTAAACCGGGAA 


209-215 


4 






Ill2itg50343 


LI 


(A^IO 


ATATOGCGGCCATGCTG^ 


TGTOATGTOGTOCAAGA 


235-239 


3 






Ill2itg20858 


LI 


(AAC)12 


ACCCCAOTCTCCCAACTCT 


TCCATGAATGAAATGGGG^ 


229-238 


3 


Lhispanicus 


Pollen-specific 
protein SF3 


Ill2itg20038 


LI 


aA)9 


TCAGAAACAAAGGGGTOC 


TCCAGAAATOTOTACATCCCA 


179-183 


3 






Ill2itg52625 


LI 


(TCA)12 


CTGGTOTCTGTCGACTCCA 


GACCAAGAAGTCAAGCTCGG 


109-124 


4 






Ill2itg37631 


LI 


CT12 


TAAAGTGCCACCAACAAGCA 


TOTGTOGTOTGTGTAGAGAGA 


133-155 


6 






Ill2itg27097 


LI 


(AAT)7 


TOAACTACCGGTOAACCAC 


GCCCAGAATOGGGTGCm 


206-209 


2 






Ill2itg22424 


LI 


(GAA)7 


AAACGACCAACCGCATAAAG 


GATGCGTGAAACTGCAAAGA 


240-249 


3 


L.hisponicus 


N-acetylglutamate 
synthase 



3- -O 

\ ? 

< o 

< D 
^ N 
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Q. Q 
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3 § 

^ 3 

^ g- 



2itg29703 


LI 


(GA)8 


2itg28437 


LI 


aA)9 


2itg36804 


LI 


(ATA) 12 


2itg21177 


LI, L2 


(CA^8 


2itg39645 


LI 


(A^IO 


2itg35309 


LI 


aA)8 


2itg56943 


LI 


(GA)8 


2itg31693 


LI 


aA"08 


2itg 10347 


LI 


(AT)8 


2itgl4618 


LI 


(CA^7 


2itg20466 


LI, L2 


aA)9 


2itg53474 


LI, L2 


(GA)IO 


2itg51894 


LI 


(A^IO 


2itg24819 


LI 


(AT)8 


2itg55310 


LI 


aA)9 


2itg 14694 


LI, L2 


aA)8 


2itg35641 


LI 


(A^8 


2itg38340 


LI 


aA^7 


2itg26293 


L2 


aCCGAA)15 



ACCmGCGCCAAGATACAC 
GGGCACAmGACTCmCG 
CACATGAGAAGCAGCAATGAA 
COTGAGGCCAATAAATGGA 
AATCATGGCC^mGOTG 
TOATGGCAAGAAAAACATCT 
GAGGCCCAAAAACAGAAACA 
AGGGGCAAAGCTCAAAGACT 
TGTGGTAAATGCAGGCTCAG 
TOCTCATCTCCCACACCTC 
GTAATCATOATGTATAATOTAACACTC 
CTGAAGTGAGGTOGGGAAG 
TGACmGATOmAGC™CAGG 
CATOATOTCTAATC^GTGTCA 

ACCAAAAGGGTGGGTGAAAT 
AAGTAGGAAGATCGAATATGAACG 
AGTOCAATTCAACAACGCA 
AGCTCCAC^AGAATOCG 
CCTGCAGTGGTAGAACCTGG 



atotgacggmcactccc 
tccgtgcaatgtcaatatcaa 
atgcggtggagtggaagtaa 
™aggaagctagggccaca 

cgtotgctctggtotoc 
aatcatccatgccamaaca 

CCAmGCGTOGGTOTAT 

catoaca^atcctcatoactc 
atgcaacgggaaccatagtc 
agotctgotgtaatcggc 

caatoa™tctgta™™cccc 
tcaatcacacatgotgtoc 

tgaatgtcaaatgcaatatoagga 
taaagotgtctotgcccg 

CCTAACAmGAACATAmAAAACAA 
GGGAAAATATCGAGG^CATC 

CATGCTCTATGGCAAGTGCT 
TCTATOTOCATGCACATOTCCC 
GAAGCAAGGTCCACAGAAGG 



213-219 


4 






260-268 


4 


L.hisponicus 




254-260 


2 


L.hisponicus 




217-226 


3 


L.hisponicus 


Delta-8 sphingolipid desaturase 


148-169 


5 






271-281 


4 






270-272 


2 


L.hisponicus, L.mutobilis 




196-217 


4 


L.hisponicus 




184-186 


2 


L.hisponicus 




237-252 


4 






180-186 


3 




Cytochrome B561 


230-234 


3 




Cullin-1 


247-263 


3 


L.hisponicus 




219-244 


5 


L.hisponicus 




277-283 


4 






268-278 


3 


L.hisponicus, Lmutobilis 


RNA-binding protein 


247-251 


3 






164-173 


4 


L.hisponicus 




123-183 


6 




18S ribosomal RNA gene 



Table 1 Characteristics of 50 EST-SSR primers developed in L luteus. Shown for each primer pair are the library specificity, repeat motif, forward and reverse 
sequence, allele range size (bp), number of alleles, amplification in other Lupin species, and annotation (Continued) 



1 12itg42878 


L2 


(CATOC)ll 


CAACTOTGmGCAGACCG 


GCTACCCmCGGGACTAGC 


217-235 


4 


Lhispanicus, L.mutabilis 




ll2itg 13749 


L2 


(nCCGQS 


1 1 1 1 lACTCGACTCGCTCCC 


CCAGTCGAmAGCAGTCGC 


207-261 


7 


Lhispanicus, L.mutabilis 




Il2itg32760 


LI, L2 


(CGGAAT)14 


tcataatgaa™aa™accccc 


TCCCTGACTCTGTCmGGG 


146-284 


14 


Lhisponicus 




1 12itg00675 


L2 


(TC^8(TCG)5 


AGAGAGATCCTCmGACGCC 


GTGGTOGCGAGAACCATCG 


187-199 


4 




BSD domain-containing protein 


Il2itg45631 


L2 


(ATC)IO 


AAACCGAATOTGGATCAGC 


GGGGACTCTGGAAAATCAGG 


146-155 


3 


Lhispanicus, L.mutabilis 


Alpliavirus core protein family 


1 12itg20349 


L2 


(AAC)7 


ACTAAGGGAAAGGGATOGG 


CCAGGCAAGAACAAAAGAGG 


186-189 


2 


Lhispanicus, Lmutabilis 


LPA2 (low psii accumulation2) 


Il2itg41827 


L2 


(TO)7 


TOAGTCATATCACCATAGCGG 


CAACCACAAATGGAAAACCC 


242-245 


2 


Lhispanicus, Lmutabilis 


Lipase class 3 family protein 


1l2itg47916 


L2 


ac^9 


GGTGGGTGAAAATGAAATGG 


TAACCAAAATGGTOGTCGG 


241-247 


2 


Lhispanicus, Lmutabilis 




1 12itg42002 


L2 


(AAC)8 


OTGCAGGGTOTC™CAGC 


GGGGI IGI 1 1 1 IGGTGTCC 


243-246 


2 


Lhispanicus 




1 12itg54849 


L2 


(ACA)7 


TOTCCAATGATGAAATGCC 


TOACGGCTAAATACCAAGC 


177-183 


2 


Lhispanicus 


Microtubule-associated protein 


ll2itg 13638 


L2 


(TGT)9 


CCATGGTCATCATOACCCC 


CGAGTCGAGTOGmACCC 


188-200 


5 


Lhispanicus, Lmutabilis 


f-box family protein 


1 12itg26640 


L2 


(AG)7 


GGTCTGTOGAGAAGGCTACC 


CCACCAATGGGTAGACATACG 


203-209 


3 


Lhispanicus 


Small nuclear ribonucleoprotein 


1 12itg29887 


L2 


(GC^IO 


CCCATCTGAAAGACmCGGC 


TCCC^CATCCAGAGAGG 


243-249 


2 


Lhispanicus 


Ser/thr-protein kinase AFC2 


1 12itg50945 


L2 


(CCA)6(ACA)7 


CCAGAACAAGGAGAAGGTOC 


TOTOTOCTCGCAGGC 


198-204 


3 


Lhispanicus 


Zinc finger. Transcription factor 


1 12itg44905 


L2 


(C^9 


AAATCACAGAGCCAAGGAGG 


TCAGCm^GmCCAAGC 


356-362 


3 


Lhispanicus, Lmutabilis 


Transcription factor 


Il2itg09113 


L2 


(AT)8 


CATGACCCAATCTCAAACCC 


gcatctggatctgc™atog 


341-343 


2 


Lhispanicus 




1 12itg03938 


L2 


(CCGA^9 


CATGTGGGAAGACCAGAAGC 


ACTACGCGCTGCTAATGTCC 


212-290 


7 


Lhispanicus, Lmutabilis 


Polygalacturonase 


Il2itg32421 


L2 


(AATCGG)8 


AGAGAAGTAGGCATGGTGGC 


GATCGGCCTATOACTCAGC 


221-293 


5 


Lhispanicus, Lmutabilis 




Il2itg29217 


LI, L2 


(AT)7 


ACACTCTCAAGGAAAAGGGC 


CCAmAACCGATAATGOTGG 


340-344 


2 


Lhispanicus 


Lactoylglutathione lyase 


Il2itg27515 


L2 


(^017 


CATGCGTCCAATCTATCACC 


AGTGGGAAACAAGGAAGTGG 


182-221 


8 


Lhispanicus, Lmutabilis 


PPR-containing protein 


Il2itg41211 


L2 


(GAA)ll 


TCCTCCTGCTOAGAACG 


AAATCCACGTCATCAATCCG 


209-230 


6 


Lhispanicus, Lmutabilis 
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Table 2 cDNA 454 assembly statistics of L1, L2 and L1L2 
L luteus libraries 



Library statistics LIuteus EST-library 





LI 


L2 


L1L2 


Number of sequenced bases 


205,618,165 


530,678,975 


736,297,140 


Number of reads 


755,206 


1,468,202 


2,213,408 


Number of reads assembled 


604,869 


1,345,892 


1,964,517 


Read average length 


276 


361 


332 


Number of contigs 


26,975 


43,674 


71,655 


Contig average size 


589 


986 


901 


Number of isotigs 


21,235 


35,191 


55,309 


Isotig average size 


589 


986 


901 


Number of isogroups 


15,295 


24,653 


36,886 


Isogroup average size 


589 


989 


905 


Average number of 
reads by contig 


22.4 


30.8 


27.4 


%GC 


30.7 


39.9 


37.5 


Annotated sequences 






32,862 


Gbrowse mapped sequences 






25,400 



between yellow lupin and Medicago. Thirty-three out of 
79 (42%) primer pairs amplified clear PGR products. 16 
pairs showed expected sizes based on Medicago genomic 
regions. The remainder primer pairs amplified shorter or 
longer lupin fragments than the fragments amplified in 
M, truncatula, Amplicon sequence data for L. luteus 
containing intergenic DNA sequence were mapped onto 
the Medicago genome using blast (Figure 3). The align- 
ments between L. luteus and Medicago showed high 
levels of conservation in the coding regions, but little 
sequence similarity in the intergenic regions. When L. 
hispanicus DNA was included as PGR template, only 23 
primer pairs amplified. Variable amplification was likely 
due to localized sequence polymorphism within the pri- 
mer binding site {i,e, small indels) and not the lack of 
microsynteny. This ratio (23/33) is similar to the num- 
ber of EST-SSRs that were found to amplify fragments 
in both species. Alignments among L. luteus and L. his- 
panicus were possible at intergenic regions but 
sequences were clearly less similar than coding regions. 

When these markers were evaluated on the screening 
panel of diverse germplasm accessions, 10 had length 
polymorphism for these intergenic regions (Additional 
file 1). In addition to EST-SSRs, this new Gonserved 
Microsynteny (GMS) marker could be valuable resource 
for crop improvement with molecular markers. 

Identification of EST-SSRs 

A total of 2,572 isotig sequences contained at least one 
EST-SSR, with a frequency of one SSR per 17.75 kilo- 
bases (Table 3). The observed frequencies for di-, tri-. 



tetra-, penta-, and hexa-repeats were 30.4%, 52.7%, 2.4%, 
7.5% and 6.2%, respectively (Table 4). Among the di- 
nucleotide repeats, the AT/TA motif was the most fre- 
quently observed (49%) followed by GA/GT (45%). The 
AG/GT motif was found in low frequency (6%) and 
there were no CGIGC motifs in the Lupinus sequences. 
Tri-nucleotide repeats, predominantly A/T-rich motifs 
(74.5%), were the most frequent tri-nucleotide repeat 
found in the Lupinus transcriptome. These tri- 
nucleotide repeats were often found within the coding 
sequence of putative genes (77.2%). GAA/GTT motif 
was the most frequent tri-nucleotide repeat (31%). 

Evaluation of EST-SSRs within yellow lupin and other 
lupin species 

Studies involving repeat sizes and level of polymorphism 
have suggested a positive correlation between repeat 
number and rates of polymorphisms, especially in di- 
meric microsatellites [28,42]. Thus, only EST-SSRs con- 
taining at least 7 repeat units were selected for 
validation to increase the likelihood of finding markers 
polymorphic between lupin accessions. A total of 783 
EST-SSR candidate loci had sufficient repeat units, but 
only 375 had enough repeat flanking sequence to be 
suitable for primer design. PGR amplification of these 
markers resulted in 222 EST-SSRs (59%) that were poly- 
morphic among the six diverse L luteus included in 
screening panel. 130 EST-SSRs were monomorphic and 
23 primer-pairs failed to amplify. A small number (6) of 
EST-SSRs were validated by Sanger sequencing. The 
amplicon sequences from four different L, luteus geno- 
types and from L hispanicus and L mutabilis confirmed 
the existence of SSR motifs and their length variability 
between lupin accessions (Figure 4). EST-SSR amplicons 
showed high conservation at the flanking SSR regions of 
both Lupinus species when compared with L. luteus. 
However, several indels were observed in adjacent 
regions and within the SSR motif, especially in L. 
mutabilis. 

Fifty polymorphic EST-SSRs were used to genotype a 
sample of 64 L. luteus accessions (Table 1 and Additional 
file 2). Twenty- four of these selected markers were spe- 
cific to LI (leaf-flower EST library), 20 EST-SSRs were 
specific to L2 (seed EST library), and 6 were present in 
both libraries. Neighbor- joining distance analysis 
detected several clusters among L, luteus accessions, 
strongly suggesting the existence of population subdivi- 
sions (Figure 5). However, no clear geographical patterns 
(country of origin) were observed among lupin acces- 
sions. Interestingly, Ghilean accessions were distributed 
in most clusters, probably reflecting the breeding history 
of these genotypes. Two hundred and fifty four (65.7%) 
and 113 (30%) SSR primer pairs were able to amplify 



Parra-Gonzalez et at. BMC Genomics 2012, 13:425 
http://www.bionnedcentral.conn/1 471 -21 64/1 3/425 



Page 8 of 1 5 



Molecular Function 

(31,142) 




b) 



Protein modification process 
(1,525) 
Generation of precursor 
metabolites and energy (887) 

Signal transduction (848) 

Post-embryonic development. 
(399) 
Reproduction (783) 

Cellular component 
organization (1.673) 



Transport (3.445) 



Translation (1,136) 

Response to endogenous 
stimulus (508) 
Response to abiotic stimulus. 
(749) 

Response to stress (1 ,599) 




Cellular homeostasis (346) 
Lipid metabolic process (958) 

Anatomical structure 
morphogenesis (481) 

DNA metabolic process 
(1.914) 



Catabolic process (1 .640) 



Transcription (1,782) 

Cellular amino acid and 
derivative metaboic process 
Carbohydrate metabolic (1 .029) 

process (1 ,440) 




Figure 1 GO term annotations for LI L2. Isotigs were grouped under tliree categories: (a) molecular function, (b) biological processes, 
and (c) cellular components. Numbers between parentheses indicate the number of positive matches for each function. 
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429 ^4 



Medicago truncatula (68,848) 



Arabidopsis thaliana (11 2,827) 




Lotus japonicus (47,486) 



Glycine max 
(135,186) 



Figure 2 Venn diagram summarizing the distribution of tBIastX matches between L luteus and four model species (A thaliana, M. 
truncatula, L. japonicus and G. max). Numbers following the model species correspond to the size of the respective data base. Numbers within 
the Venn diagram indicate the number of sequences sharing similarity using tBLASTx. Numbers within parenthesis indicate the percentage of 
matches in terms of the total number of L. luteus sequences. 



fragments from L hispanicus and L mutabilis DNA, 
respectively. 

Discussion 

Next-generation sequencing has reduced the existing 
gap between major crop genomic platforms and the lim- 
ited resources that are currently available for orphan 
crops [10]. Complete transcriptome sequencing has gen- 
erated species specific molecular markers, in silico ex- 
pression analyses, gene discovery, and phylogenetic 
relationships [43,44]. 

In this research, we used 454 cDNA sequences to as- 
semble transcrip tomes of two tissues (LI and L2) of yel- 
low lupin. We recovered a large number of previously 
unknown and uncharacterized yellow lupin gene 
sequences (Table 2). The total number of sequences for 



the combined library was mostly additive from LI and 
L2. The LI library favored the inclusion of longer 3'UTR 
regions, and thus, reducing the amount of coding 
sequences needed to assemble longer combined contigs 
(L1L2). As a consequence, two or more sequences 
belonging to the same transcript may not be assembled 
together, causing an overestimation of expressed 
sequences. The larger amount of 3'UTR regions for LI is 
also in agreement with the lower GC content, condition 
typically associated with untranslated regions [45,46]. 
Undoubtedly, a number of expressed sequences are tissue 
specific and will not assemble into combined contigs. For 
instance, several genes related to seed dormancy and ger- 
mination are not expressed in vegetative and floral tis- 
sues [47,48]. The same specificity was observed in a 
number of tissues and plant species [49-51]. The 
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Figure 3 Microsyntenic L luteus DNA fragments mapped on the Medicago genome using a GBrowse platform, (a) L luteus microsyntenic 
region 13 on M. truncotulo chromosome 1; (b) L luteus microsyntenic region 5 on M. truncotulo cliromosome 1; (c) L luteus microsyntenic region 
11 on M. truncotulo cliromosome 2. 



assembly of L1L2 generated 55,309 isotigs of which 
30,811 had similarity to putative proteins found in other 
plant species. Comparative studies carried out against 
L japonicus, M, truncatula and G, max showed a total 
of 31,520 lupin sequences similar to at least one of the 
model legume databases and 22,219 were similar to all 
of them. Lotus and Medicago belong to the Galegoid 
subclade, which includes mostly temperate legume spe- 
cies [52]. Glycine is a member of the Phaseoloid subclade 
which comprises mostly tropical species [52]. Lupins 



belong to the Genistoid subclade, which is sister (and 
distant) to most of the described Papilionoid subclades; 
especially those containing most domesticated species 
[53]. 

Although micro-repeat motifs are frequent in plant 
genomes and their respective transcriptomes, the fre- 
quency of SSR discovery depends on the search criteria 
[42,54-56]. We analyzed 55,309 lupin isotig sequences 
using MISA and identified 2,796 SSR motifs with an aver- 
age frequency of one SSR per 17.75 kbp. Tri-nucleotide 
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Table 3 Features of EST-SSRs identified in assembled 
L1 L2 L luteus library 



Total number of examined sequences 


55,309 


Estimated transcriptome screened (l<bp) 


49,841 


Number of sequences containing SSRs 


2,572 


Number of identified SSR 


2,774 


Number of EST-SSRs in coding regions 


1,435 


Number of sequences containing more tlian 1 SSRs 


147 


Number of SSRs present in compound formation 


195 


Frequency of SSR in transcriptome 


1/18 Kbp 



Table 4 Distribution of repeat types and number of 
repeats within the L1 L2 L luteus library 



Repeat type Number of repeat units Total (%) 





4 


5 


6 


7 


8 


9 


> 10 




Di-nucleotide 






363 


204 


120 


72 


91 


851 (30.7) 


Tri-nucleotide 




826 


369 


131 


69 


25 


57 


1477 (53.2) 


Tetra-nucleotide 




43 


9 


3 


1 


2 


8 


66 (2.4) 


Penta-nucleotide 


129 


46 


6 


3 


9 


6 


12 


209 (7.5) 


Hexa-nucleotide 


105 


26 


11 


3 


9 


5 


13 


171 (6.2) 



repeats were the motifs most frequently found in L luteus 
expressed sequences. Similar results have been reported 
in numerous plant species [26,28,54,55,57]. The abun- 
dance of trimeric EST-SSRs has been attributed to the 
absence of frameshift mutations when there is length 
variation in these SSRs [58]. Indeed, 1,435 EST-SSRs 
were discovered within coding regions of the gene. 
Among tri-nucleotide repeats, AT-rich motifs were the 
most predominant ones (74.5%), which have also been 
observed in soybean. Citrus and Arabidopsis [54,57]. For 
di-nucleotide repeats, AT was the most frequently 
observed motif, contrasting with results from Arabidop- 
sis, soybean, maize, rice, wheat and barley where AC/GT 
were the most frequent repeats [26,28,54,55,57]. The 
high proportion of untranslated sequences (specifically 



3'UTR), mainly contributed from the LI, could explain 
the bias toward A/T-rich repeat sequences observed in 
yellow lupin. There were no CG repeats in the lupin 
sequences, similar to results obtained in barrel medic 
[24], rice, corn, soybean [57], wheat [27], Sorghum [25], 
Arabidopsis, apricot and peach [59]. 

We used GBrowse to visualize lupin ESTs aligned to 
the M, truncatula chromosomes (Figure 3). This ap- 
proach potentially identifies paralogs sequences and 
allows color-coded alignment by BLAST significance 
[60]. A total of 25,4001. luteus contigs were localized 
and found to be distributed across the entire Medicago 
genome with chromosomes Mtl and Mt3 having the 
highest number of gene matches. Each yellow lupin se- 
quence was mapped to an average of 3.7 locations, 
which may correspond in part to rounds of genome 
duplications previously described for the Medicago gen- 
ome [61]. Understanding syntenic relationships among 
species is essential to exploit the available tools deve- 
loped for comparative genomic analysis. Using this 
approach, we created a new method of developing mo- 
lecular markers, markers that are based on conserved 
microsynteny (CMS) between orphan and model spe- 
cies. Genome comparisons among M, truncatula, G, 
max and L, japonicus have shown that, in general, most 
genes in Papilionoid legume species are likely to be 
found within a relatively long syntenic region of any 
other Papilioniod species [62]. Positive amplification 
and sequencing of L, luteus intergenic regions, based on 
PCR primers located on M, truncatula adjacent genes, 
suggested the existence of microscale synteny between 
these legume species. Roughly 40% of the targeted 
intergenic L, luteus regions amplified, points out the 
usefulness of conserved legume chromosome blocks for 
genomic studies of orphan crops. Although some pri- 
mer pairs failed to amplify, poor amplification could be 
a consequence of non-synteny, but also other technical 
limitations could also explain negative PCR results. For 
instance it is known that non-coding DNA regions are 
highly variable among species [63,64], and negative PCR 



1.itg03739_18 

2. itg03739_98 

3. itg03739_104 

4. itg03739_194 

5. itg03739_hisp 

6. itg03739_muta 



b) 



1. itg16318_18 

2. itg16318_98 

3. itg16318_104 

4. itg16318_194 

5. itg16318_hisp 

6. itg16318_muta 



1. itg21236_18 

2. itg21236_98 

3. itg21236_104 

4. itg21236_194 

5. itg21236_hisp 

6. itg21236_muta 



GTCAGAGCAACCAAGCAAAGCAAGTATACCACACATTATGGAACAACAGAGTTATTATGAAGAAGA-AGAAGAAGAAAAAGAGAGAGATAGAAAAATTGAAGAGAATTGAAGAAAGACAB 

GTCAGAGCAACCAAGCAAAGCAAGTATAOGACAl^TTATGGAAG^GAGAGTTATTATGAAGAAGA-AGAAG AAAAAGAGAGAGATAGAAAAATTGAAGAGAATTGAAGAAAGACAa 

GTmGAG@AACCAAGCAAAGCAAGTAT AMIIB|iM|i|iliiaJI GG ai^"aAm GAGiai 1 Tl TGAAGAAGA-AGAAGAAGAAAAAGAGAGAGATAGAAAAATTGAAGAGAATTGAAGAAAGACAC- 
Giafl.GAGBAAgCAAGCAAAGCAAGTA TiaMWMM^^M GG MWBW GaG M^M MGAAGAAGA-AGAA.GAAGAAAAAGAGAGAGATAGAAAAATTGAAGAGAATTGAAGAAAGACAC 

GM&GAGKlAMIAAGCAAAGCAAGTATa— — — GG— GBG — G1 GAAGAAGA-AGA AAAAAGAGAGAAATA GAAGAGAATCGAAGAAAGACAC 

GTBAGAG»UUECAAGCAAAGCAAGTAT3«HM«BMHH«GBHHBBBHGMGB«H«Mi GAATTGAAGAAAGAAAA 

1 -a 4j r. -uu 11;: 120 

tCTCATATAtACTACATAAGATATATACAtATA-ATTATAATAATAATAATAATAATTACAATTAATAATAAAATACCATGCATTATGAm 
TaTBA TAIAmMI&BAJAAGAIAJAIABATATA-ATTATAATAATAAT AAJLAA^ 

A^CaaHanBaHMUaBHOHaHHUWBaaGKaKaGmaGaaBKLTGTGGGAA-TGGAATCTATGT 

■A-ATTATAATAATAAM^^^^^^I^^^^^^^^^^^^^MG^^^MG^^BG ^^M BgGTGGGAA-TGGAATCTATGT 

■ATATAATAATAATAAHMHHHIHHHIH ■■■■■■ilMMGWi— G— ■GH ^^M HTGTGGGAA- TGGAATC TATGT 

■■■■■GMGHiMMGMMGl^MBRTGTGGAAATTGGAATC TATGT 

HTTAAAAAABTTGAATAAT-AAAAAATCAAAAGGAGAGAGAGAGAG - -AGAAGAAC^TAACCGATCAGAAGAAGAAGAAGAAGACGAACLZy^GAAGCf GG TGAGG^ 

-AAAAAAflMAAAAGGAGAGAGAGAGAG AAGAA«aAAaBGAa!«AGAAGAAGAAGAAGAAGACGAACAAGAAGCTGG TGAGGAAGAAGTGA 

^©.GGAGAGAGAGAGAG AAGA J— G — GIGIGBAGAAGAAGACGAACAAGAAGCTGG TGAGGAAGAAGTGiS 

6GGAGAGAGAGAGAG AGAAG AB— — G — GMGMGgRGAAGAAGACGAACAAGAAGCTGGTGAGGAAGAAGTGfl 

EGGAGAGAGAGAGAG AGAAGj M— G — GMGMGE^G ACGAACAAGAAGCGGGTGAGGAAGAAGTGa 

EGGAGATAGAGAGAGGAAGGAAGAAGISBMBMGBHBBGBBGHBGEGGAA GAAGAAGAAGGTGG TGAAGAAGAAGTGS 



Figure 4 Alignment of L luteus, L hispanicus and L mutobilis containing several repeat motifs, (a) isotig03739 with GA and AGA motifs; 
(b) isotigl6318 witli a TAA motif; and (c) isotig21236 witli a GAA motif. 
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Figure 5 Neighbour Joining tree relating the 64 L luteus accessions included in the diversity study. Numbers above branches correspond 
to bootstrap values. Accessions are identified by a letter L followed by numbers. Letters around accessions identif/ country of origin based on 
seed bank or breeding histories (RUS: Russia, ISRL: Israel, HUNG: Hungary, CHIL: Chile, GER: Germany, SPN: Spain, PORT: Portugal, MORO: Morocco, 
POL: Poland, BYS: Belarus, UKR: Ukraine). The scale is in distance units. 
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amplifications could easily due to excessively long 
L, luteus intergenic regions. 

Few studies have reported the use of EST-SSRs in 
Lupinus species [19,21,22]. Most efforts have focused on 
genetic linkage mapping and in diversity studies in 
L, angustifolius [20], L, albus [21] and L, luteus [22]. To 
validate our L, luteus polymorphic markers we tested 50 
EST-SSRs on a population of 64 genotypes of L, luteus. 
An analysis of genotypic diversity illustrated the exist- 
ence of several clusters within L, luteus germplasm. The 
lack of a clear pattern following the geographical acces- 
sion origin (country) could be explained by three reasons. 
1) The number of accessions may not have been large 
enough to allow a clear pattern to emerge. 2) L, luteus is 
widely distributed across the Mediterranean region, 
mainly due to human introductions [6]. This situation 
could have homogenized natural genetic distinctiveness, 
leaving mostly population subdivisions based on breeding 
histories. 3) Finally, it is possible some accessions could 
have been misclassified; and thus, obscuring an existing 
geographical clustering pattern. 

We observed that a number of high yellow lupin EST- 
SSR amplified fragments in two other lupin species, 
L, hispanicus and L, mutabilis (Table 1). The high num- 
ber of transferable markers between L, luteus and L, his- 
panicus confirmed their closer genetic relationship 
[5,65] than L, luteus and L mutabilis. The two closely 
related species have the same chromosome number 
(2n = 52) and are still interfertile, generating a natural 
hybrid called hispanicoluteus [66], Phylogenetic studies 
have placed new and old world lupins into two different 
clades [5,65,67]. Thus, most EST-SSRs amplified in 
L, mutabilis (2n = 48), the only cultivated new world 
lupin [65], should have high transferability rates to other 
lupin species, such as L, albus and L, angustifolius. The 
understanding of the genetic diversity among other close 
relative lupin species will facilitate the transfer of favor- 
able variation into cultivated species. For instance, L, his- 
panicus has been suggested as a reservoir of favorable 
variation for a number of biotic and abiotic stresses cur- 
rently affecting!, luteus [68,69]. 

Conclusion 

L, luteus deep transcriptome sequencing will facilitate 
the further development of genomic tools and lupin 
germplasm. Massive sequencing of cDNA libraries will 
continue to produce raw materials for gene discoveries, 
identification of polymorphisms (SNPs, EST-SSRs, 
INDELs, etc.) for marker development, anchoring 
sequences for genome comparison studies and putative 
gene candidates for QTL detection. We are also exploit- 
ing the microsyntenic regions observed among L, luteus 
and legume model species to saturate yellow lupin link- 
age maps by amplifying conserved regions across legume 



species. The utilization of these tools will allow trans- 
forming L, luteus into a valid temperate legume crop 
alternative. 

Additional files 



Additional file 1: Table SI. Characteristics of 33 Cor^served 
Microsynteny (CMS) markers developed in L luteus. Showr^ for each 
primer pair are the Medicago chromosome library specificity, 1112 isotigs 
where CMS forward ar^d reverse primers were anchored, forward and 
reverse sequence, expected Medicago amplicon size (bp), L. luteus CMS 
amplicon size (bp), amplification in other Lupin species (L hispanicus), 
and the level of polymorphism on the L luteus screening panel. 

Additional file 2: Table S2. Lupinus luteus, L. hispanicus and 
L nnutabilis accessions included in the study. 
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